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Foreword 



This Technical Specification has been produced by the 3"^ Generation Partnership Project (3GPP). 

The present document gives the detailed requirements for the correct operation of the background acoustic noise 
evaluation, noise parameter encoding/decoding and comfort noise generation within the digital cellular 
telecommunications system. The present document is part of a series covering the half rate speech traffic channels as 
described below: 

GSM 06.02 "Digital cellular telecommunications system (Phase 2+); Half rate speech; Half rate speech 

processing functions". 

GSM 06.06 "Digital cellular telecommunications system (Phase 2+); Half rate speech; ANSI-C code for the 

GSM half rate speech codec". 

GSM 06.07 "Digital cellular telecommunications system (Phase 2+); Half rate speech; Test sequences for the 

GSM half rate speech codec". 

GSM 06.20 "Digital cellular telecommunications system (Phase 2+); Half rate speech; Half rate speech 

transcoding". 

GSM 06.21 "Digital cellular telecommunications system (Phase 2+); Half rate speech; Substitution and muting 

of lost frames for half rate speech traffic channels". 

GSM 06.22 "Digital cellular telecommunications system (Phase 2+); Half rate speech; Comfort noise 

aspects for half rate speech traffic channels". 

GSM 06.41 "Digital cellular telecommunications system (Phase 2+); Half rate speech; Discontinuous 

Transmission (DTX) for half rate speech traffic channels". 

GSM 06.42 "Digital cellular telecommunications system (Phase 2+); Half rate speech; Voice Activity Detector 

(VAD) for half rate speech traffic channels". 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of the present document, it will be re-released by the TSG with an 
identifying change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

X the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 or greater indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the document. 
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Scope 



The present document gives the detailed requirements for the correct operation of the background acoustic noise 
evaluation, noise parameter encoding/decoding and comfort noise generation in GSM Mobile Stations (MS)s and Base 
Station Systems (BSS)s during Discontinuous Transmission (DTX) on half rate speech traffic channels. 

The requirements described in the present document are mandatory for implementation in all GSM MSs capable of 
supporting the half rate speech traffic channel. 

The receiver requirements are mandatory for implementation in all GSM BSSs capable of supporting the half rate 
speech traffic channel, the transmitter requirements are only for those where downlink DTX will be used. 



References 



The following documents contain provisions which, through reference in this text, constitute provisions of the present 
document. 

• References are either specific (identified by date of publication, edition number, version number, etc.) or 
non-specific. 

• For a specific reference, subsequent revisions do not apply. 

• For a non-specific reference, the latest version applies. In the case of a reference to a 3GPP document (including a 
GSM document), a non-specific reference implicitly refers to the latest version of that document in the same 
Release as the present document. 

[1] GSM 01.04: "Digital cellular telecommunication system (Phase 2+); Abbreviations and 

acronyms". 

[2] GSM 06.20: "Digital cellular telecommunications system (Phase 2+); Half rate speech 

transcoding". 

[3] GSM 06.41: "Digital cellular telecommunications system (Phase 2+); Half rate speech; 

Discontinuous Transmission (DTX) for half rate speech traffic channels". 

[4] GSM 06.42: "Digital cellular telecommunications system (Phase 2+); "Half rate speech; Voice 

Activity Detector (VAD) for half rate speech traffic channels". 

[5] GSM 06.06: "Digital cellular telecommunications system (Phase 2+); Half rate speech; ANSI-C 

code for the GSM half rate speech codec". 



3 Definitions, symbols and abbreviations 

3.1 Definitions 

For the purposes of the present document, the following terms and definitions apply. 

frame: time interval of 20 ms corresponding to the time segmentation of the half rate speech transcoder, also used as a 
short term for a traffic frame. 

H(Z): combination of the short term (spectral) filter A(z) and the spectral weighting filter W(z). 

SID codeword: fixed bit pattern for labelling a traffic frame as a SID frame. 

SID field: bit positions of the SID codeword within a SID frame. 

SID frame: frame characterized by the SID (Silence Descriptor) codeword. It conveys information on the acoustic 
background noise. 
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SP flag: speech flag. 

speech frame: traffic fi-ame that cannot be classified as a SID frame. 

VAD flag: Voice Activity Detector flag. 

W(Z): spectral weighting filter of the GSM half rate speech codec. 

Other definitions of terms used in the present document can be found in GSM 06.20 [2] and GSM 06.41 [3]. The overall 
operation of DTX is described in GSM 06.41 [3]. 

3.2 Symbols 

For the purposes of the present document, the following symbols apply: 

GS Energy tweak parameter. 

RO Frame energy value. 

R(i) Unquantised (normalized) autocorrelation sequence, 

r: Optimal reflection coefficient, 
b 

SUM ( x(n) ) = x(a) + x(a+l) + .... + x(b-l) + x(b); (Accumulation). 
n=a 

GSPO codeword Vector quantization index, joint vector quantization of the parameters GS and PO. 

PO Power contribution of the first excitation vector as a fraction of the total excitation power at a 
subframe. 

3.3 Abbreviations 

For the purposes of the present document, the following abbreviations apply: 

AFLAT Autocorrelation Fixed Point LAttice Technique (used in the GSM half rate speech codec for the 

vector quantization of the LPC coefficients) 

BSS Base Station System 

DTX Discontinuous Transmission 

ETS European Telecommunication Standard 

GSM Global System for Mobile communications 

MS Mobile Station 

SID Silence Descriptor 

RX Receive 

TX Transmit 

VAD Voice Activity Detector 

VQ Vector Quantization 

For abbreviations not given in this subclause, see GSM 01.04 [1]. 



General 



A problem when using DTX is that the background acoustic noise, which is transmitted together with the speech, would 
disappear when the radio transmission is switched off, resulting in a modulation of the background noise. Since the 
DTX switching can take place rapidly, it has been found that this effect may be annoying for the listener, especially in a 
car environment with high background noise levels. In bad cases, the speech may be hardly intelligible. 

The present document specifies a solution to overcome this problem by generating synthetic noise similar to the 
transmit (TX) side background noise on the receive (RX) side. The comfort noise parameters are estimated on the TX 
side and transmitted to the RX side before the radio transmission is switched off and at a regular low rate afterwards. 
This allows the comfort noise to adapt to the changes of the noise on the TX side. 
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Functions on the transmit (TX) side 



The comfort noise evaluation algorithm uses the following parameters of the GSM half rate speech encoder, defined in 

GSM 06.20 [2]: 

the unquantized frame energy value RO; 

the unquantized (normalized) autocorrelation sequence R(i) derived from the optimal reflection coefficients rj; 

the quantized energy tweak parameter GS. 

These parameters give information on the level (RO and GS) and the spectrum (R(i)) of the background noise. 

Two of the evaluated comfort noise parameters (RO and R(i)) are encoded into a special frame, called a Silence 
Descriptor (SID) frame, for transmission to the RX side. While the energy tweak parameter GS can be evaluated in the 
encoder and decoder in the same way as given in subclause 5.1, therefore no transmission of GS is necessary. 

The SID frame also serves to initiate the comfort noise generation on the RX side, as a SID frame is always sent at the 
end of a speech burst, i.e. before the radio transmission is terminated. 

The scheduling of SID or speech frames on the radio path is described in GSM 06.41 [3]. 

5.1 Background acoustic noise evaluation 

The comfort noise parameters to be encoded into a SID frame are calculated over 8 consecutive frames marked with 
Voice Activated Detector (VAD) flag = "0", as follows: 

The frame energy values shall be averaged according to the equation: 

7 

mean (ROlj]) = 1/8 SUM RO|j-n]; 

n=0 
where: 

RO[j] is the frame energy value of the current frame j (n=0); 

RO|j-n] is the frame energy of the previous frames (n=l,...,7); 

n is the averaging period index n=0,l,...,7; 

j is the frame index. 

The averaged value mean(RO[j]) is encoded using the same encoding table that is also used by the GSM half rate speech 
codec for the encoding of the non-averaged RO values in ordinary speech encoding mode. 

The (normalized) autocorrelation sequence R(i) shall be averaged according to the equation: 

7 

mean (R|j](i)) = 1/8 SUM R[j-n](i) i = 0,1,2.. .,10; 

n=0 
where: 

R[j](i) is the i'th autocorrelation value of the current frame j (n=0); 

R[j-n](i) is the i'th autocorrelation value of one of the previous frames (n=l,...,7); 

n is the averaging period index n=0, 1 ...,7; 

j is the frame index. 
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The averaged values mean(R[j](i)) are used as input parameters of the Autocorrelation Fixed Point LAttice Technique 
(AFLAT) recursion algorithm which calculates the Vector Quantization ( VQ) indices of the reflection coefficients, see 
GSM 06.20 [2]. 

The SID frame containing the quantization index of mean(RO[j]), the VQ indices of mean(R[j](i)) and the SID 
codeword is passed to the radio subsystem instead of frame number] (see subclause 5.3, SID-frame encoding). 

The averaging of the energy tweak parameters GS is made on the basis of the quantized GS parameters. The quantized 
GS parameters can be derived from the GSPO indices. These indices are used as pointers to the GSPO vector 
quantization codebook. The GS components of the selected GSPO vectors are the quantized GS values which will be 
averaged. 

The quantized energy tweak parameters GS shall be averaged according to the equation: 

7 4 

mean (GS[j]) = 1/28 SUM ( SUM GS|j-n](i) ); 

n=l i=l 

where: 

GS[j](i) is the quantized energy tweak parameter in subframe i of the current frame j (n=0); 

GS[j-n](i) is the quantized energy tweak parameter in subframe i of one of the last frames (n=l,...7); 

n is the averaging period index n=l,2,...,7; 

i is the subframe index i= 1,2,3,4; 

j is the frame index. 

NOTE: The averaging of GS is made over 7 frames only. 

For each comfort noise insertion period, the averaging of the GS parameters is done only once before sending the first 
SID frame to the decoder and for the rest of the comfort noise insertion period, the averaged value mean(GS|j]) will be 
frozen. 

Under normal conditions, the averaging of the GS parameters is done during the hangover period, but in case of short 
speech bursts handling, the hangover period can be skipped under certain conditions, see GSM 06.41 [3]. In such cases, 
the GS parameters of the last seven speech frames marked with SP flag="l" are averaged. 

The hangover period is defined in GSM 06.41 [3]. It is a period added at the end of a speech burst in which no voice 
activity is detected (VAD flag="0"), but the speech encoder stays for the processing of 7 speech frames in speech 
encoding mode (SP flag= "1"). This hangover period and the first SID frame are used for averaging the comfort noise 
parameters contained in the first SID frame. 

mean(GS[j]) can be evaluated at the decoder in the same way as in the encoder, because in both the encoder and 
decoder, the GSPO indexes of the last 7 speech frames shall be kept in memory. In case of an error free transmission, the 
GSPO indexes are identical at the encoder and decoder. 

5.2 Modification of the speech encoding algorithm during SID 
frame generation 

When the SP flag is equal to "0", the speech encoding algorithm is modified in the following way: 

the non-averaged reflection coefficients which are used to derive the filter coefficients of the filters H(z) and 
W(z) of the speech encoder are not quantized; 

the unvoiced speech encoding mode is forced. This simplifies the open loop long term prediction processing: 
only the integer lags have to be calculated, no determination of fractional lags is necessary and the frame lag 
trajectory derivation can be avoided; 



£75/ 



3GPP TS 46.022 version 6.0.0 Release 6 



ETSI TS 146 022 V6.0.0 (2004-12) 



no fixed codebook search is made. In each subfi-ame, the indices of both fixed codebooks (CODEl_l, 
...,CODEl_4 and CODE2_l, ...,CODE2_4) are replaced by pseudo random numbers uniformly distributed 
in [0,127] (7 bit random numbers); 

no GSPO determination is made. The GSPO codeword is selected as follows: 

at the beginning of a comfort noise insertion period, mean(GS[j]) is calculated as defined in subclause 5.1. 
Then mean(GS[j]) is quantized, using only the GS component of the GSPO vector quantization codebook of 
the unvoiced speech encoding mode as quantization table. The PO parameter is not averaged. For this 
parameter, the value is used which is associated with the quantized mean(GS[j]) value in the GSPO codebook 
of the unvoiced speech encoding mode. For the rest of the comfort noise insertion period, the GSPO indices 
are frozen. 

A simplified block diagram of the GSM half rate speech encoder in comfort noise insertion mode is shown in figure 1 . 



s(n) Input Signal 
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Codebook 
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VSELP 

Codebook 
2 







Long Term 

Filter State 

Update 



<- 



Mode = (unvoiced); 

a .: direct form LPC coeff. / unquantized; 

a .: weiglited direct form LPC coeff. / unquantized; 

PN : pseudo noise generator. 



Figure 1 : GSM half rate speech encoder in comfort noise insertion mode 



5.3 SID-frame encoding 



The SID frame encoding algorithm exploits the fact that only some of the 1 12 bits in a frame are needed to code the 
comfort noise parameters. The other bits can then be used to mark the SID frame by means of a fixed bit pattern, called 
the SID codeword. 

SID frames are encoded in the encoder output format for voiced frames (MODE = 3), because the two voicing mode 
bits are part of the SID codeword. 

The index of the frame energy value RO is replaced by the quantization index derived from mean(RO[j]). mean(RO|j]) is 
defined in subclause 5.1 and is encoded as described in GSM 06.20 [2]. 

The VQ indices of the reflection coefficients are replaced by VQ indices derived from mean(R[j](i)). mean(R|j](i)) is 
defined in subclause 5.1 and the VQ of the reflection coefficients is described in GSM 06.20 [2]. 

The SID codeword consists of 79 bits which are all "1". To mark a frame as a SID frame, the parameters in table 1 have 
to be set as shown. 
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Table 1 : SID codeword 



Parameter 


Number of bits 


Value (Hex) 


MODE 


2 


0x0003 


INT LPC 


1 


0x0001 


LAG 1 


8 


OxOOff 


LAG 2 


4 


OxOOOf 


LAG 3 


4 


OxOOOf 


LAG 4 


4 


OxOOOf 


CODE 1 


9 


0x01 ff 


CODE 2 


9 


0x01 ff 


CODE 3 


9 


0x01 ff 


CODE 4 


9 


0x01 ff 


GSPO 1 


5 


0x00 If 


GSPO 2 


5 


0x00 If 


GSPO 3 


5 


0x00 If 


GSPO 4 


5 


0x00 If 



The parameters in table 1 are defined in GSM 06.20 [2]. 



Functions on the receive (RX) side 



The situations in which comfort noise shall be generated on the RX side are defined in GSM 06.41 [3] and may be 
started or updated whenever a valid SID frame is received. 

6.1 Averaging of the GS parameters 

When speech frames are received by the decoder, the GS parameters of the last seven speech frames shall be kept in 
memory. As soon as a SID frame is received, these stored GS parameters shall be averaged. The averaged GS value will 
be frozen and used for the actual comfort noise insertion period. 

The averaging procedure works as follows: 

when a speech frame is received, the GSPO indices are decoded and the decoded GS parts of these parameters 
are stored in memory; 

when the first SID frame is received, the stored GS values are averaged in the same way as in the speech encoder 
as follows (see also subclause 5.1): 

7 4 

mean (GS[j]) = 1/28 SUM ( SUM GS|j-n](i) ); 

n=l i=l; 

where: 

GS[j](i) is the quantized energy tweak parameter in subframe i of the current frame j; 

GS[j-n](i) is the quantized energy tweak parameter in subframe i of one of the last frames; 

n is the averaging period index n=l,2,...,7; 

i is the subframe index 1=1,2,3,4; 

j is the frame index; 
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then mean(GS[j]) is quantized, using the GS component of the GSPO vector quantization codebook for the 
unvoiced speech encoding mode as quantization table. The resulting index of this quantization is used for one 
complete comfort noise insertion period as GSPO codeword. The PO parameter is not averaged. For this 
parameter, the value is used which is associated with the quantized mean(GS[j]) value in the GSPO codebook of 
the unvoiced speech encoding mode. 

6.2 Comfort noise generation and updating 

The comfort noise generation procedure uses the GSM half rate speech decoder algorithm defined in GSM 06.20 [2]. 
When comfort noise is to be generated, then the various encoded parameters are set as in table 2. 

Table 2: Comfort noise encoded parameters 



Parameter 


Value 


MODE 





RO 


interpolation of the values received in the last 
two valid SID frames 


LPG1 
LPC2 
LPC3 


interpolation of the values received in the last 
two valid SID frames 


INT LPC 


1 


C0DE1 1 
C0DE1 2 
C0DE1 3 
C0DE1 4 
C0DE2 1 
G0DE2 2 
C0DE2 3 
C0DE2 4 


pseudo random numbers uniformly distributed 
in [0,127] (7 bit numbers) 


GSPO 1 
GSPO 2 
GSPO 3 
GSPO 4 


index of the averaged GS parameter 

(calculated at the beginning of each comfort 

noise insertion period and frozen for the rest 

of the period) 



With these parameters, the speech decoder now performs the standard operations described in GSM 06.20 [2] and 
thereby synthesizes comfort noise. 

Updating of the comfort noise parameters (frame energy and LPC coefficients) occurs each time a valid SID frame is 
received, as described in GSM 06.41 [3]. 

NOTE: The GSPO codewords are not updated, they are frozen during each comfort noise insertion period. 

When updating the comfort noise parameters (frame energy and LPC coefficients), these parameters shall be 
interpolated over the SID update period to obtain smooth transitions. 



Computational details 



A low level description has been prepared in form of an ANSI C source code which is part of GSM 06.06 [5]. 
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